A. Introduction


I. STPLNR

STPLNR is a package for sustainable transport planning with R Studio. It provides functions for solving common problems in transport planning and modelling, such as how to best get from point A to point B. The overall aim is to provide a reproducible, transparent and accessible toolkit to help people better understand transport systems and inform policy in transportation planning. The purpose of this assignment is to develop a series of analyses that turn “raw” origin-destination data into meaningful results. Our study area will be University City, and the analyses will describe worker flows from the greater Philadelphia region into University City. The census tracts identified for the University City area are: 42101980000, 42101009000, 42101036900, 42101008801, 42101008802. This assignment also includes exploring the flow of traffic through desire lines and how metrics like employment and income may affect transit patterns in this area of the city.


B. Set-Up


I. Installing Packages

rm(list=ls())
library(tidyverse)
library(sf)
library(RSocrata)
library(viridis)
library(spatstat)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(knitr)
library(kableExtra)
library(tidycensus)
library(classInt)   
library(kableExtra)
library(RColorBrewer)
library(hrbrthemes)
library(ggthemes)
library(ggplot2)
library(plotly)
library(scales)
library(extrafont)
library(basetheme)
library(RColorBrewer)
library(lemon)
library(data.table)
library(GGally)
require(gridExtra)
library(ggalt)
library(rgeos)
library(sp)
library(smoothr)
library(rgdal)
library(rgeos)
library(maptools)
library(ggmap)
library(scales) 
library(ggeasy)
library("wesanderson")
library(spdep)
library(caret)
library(ckanr)
library(FNN)
library(grid)
library(ggcorrplot) 
library(corrr)      
library(kableExtra)
library(jtools)     
library(ggstance) 
library(ggpubr)    
library(broom.mixed) 
library(tab)
library(plotly)
library(stargazer)
library(pscl)
library(plotROC)
library(pROC)
library(lubridate)
library(sf)
library(lubridate)
library(tigris)
library(riem)
library(gridExtra)
library(knitr)
library(kableExtra)
library(vtable)
library(RSocrata)
library(stplanr)
library(leaflet)

col <- wes_palette("Moonrise2")
col5 <- colorRampPalette(col)(5)
col6 <- colorRampPalette(col)(6)
col7 <- colorRampPalette(col)(7)
col8 <- colorRampPalette(col)(8)
col10 <- colorRampPalette(col)(10)
col22 <- colorRampPalette(col)(22)
col29 <- colorRampPalette(col)(29)
col96 <- colorRampPalette(col)(96)

options(scipen = 999)
options(tigris_class = "sf")

root.dir = "https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/DATA/"

We started creating the markdown by installing the required packages and loading them into the library. Some of the mainly used packages arestplnr, tidyverse, tidycensus, and dplyr. The final deliverable for this homework is an RMarkdown file with the required plots and tables, knitted to a HTML document.


C. Section 1: Defining a Study Area


I. Pulling in the Data

The map below shows the centroids of each county in the state of Pennsylvania. We used the pa_tract_geo.txt file, which contained data on the census tracts in PA as well as the coordinates of their centroids.

OD <- read.csv("~/STUDY/Coursework/Fall 2022/Planning_Methods/stplanr_export/pa_od_2017.csv",
               colClasses = c("h_geocode" = "character", "w_geocode" = "character")) %>%
  rename(trct_org = h_geocode,
         trct_des = w_geocode) 

OD<- OD %>% group_by(trct_org, trct_des) %>% summarise(n_com = n())

tract <- fread("~/STUDY/Coursework/Fall 2022/Planning_Methods/stplanr_export/pa_tract_geo.txt", select = c("GEOID", "INTPTLAT", "INTPTLONG"), colClasses = c("GEOID" = "character"))

tract <- st_as_sf(tract, coords = c("INTPTLONG", "INTPTLAT"), crs = 4326, agr = "constant")

ggplot()+
  geom_sf(data = tract, color="#e86f00", size=1.5)+
  labs(title="Centroid for Census Tracts",
       subtitle = "Pennsylvania, 2017",
       caption = "Longitudinal Employer Household Dyanamics Origin-Destination Employment Statistics (LODES)")+
  theme_void()

After taking a look at the geography we were working with, we then proceeded to narrow our focus down to university city. We used the Philadelphia 2035 plan from the Philadelphia City Planning Commission (PCPC) to decide what census tracts would be included in the University City region. The fips codes for these tracts were: 42101980000, 42101009000, 42101036900, 42101008801, 42101008802. We cleaned the GEOID columns to make it similar across the data sets we were working with, and then filtered the column trct_des (destination tracts), to include Univeristy City. This meant that we were looking at trips that originated in the Greater Philadelphia Region and ended in University City.

phl<-st_read("https://opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson")

ggplot()+
  geom_sf(data = phl)+
  labs(title="Philadelphia",
       subtitle = "Pennsylvania",
       caption = "Open Data Philly")+
  theme_void()

II. Building Desire Lines

This section is where we take a look at desire lines that originate in the Greater Philadelphia Region, with its destinations in University City. We re-projected the desire lines to the projection that the Philadelphia data set was in and then proceeded to visualize it. We can observe that there is a lot of clustering and overlap of the desire lines with destinations in University City, which can be attributed to the number of educational, medical and commercial establishments in the area. To make the visualization a little easier, we have included an interaction map through the leaflet package that enables you to zoom in and view the O-D pair patterns more closely.

OD<- OD %>% separate(trct_org, into = c('trct_org', 'B'), sep = -4, convert = TRUE) %>%
  separate(trct_des, into = c('trct_des', 'C'), sep = -4, convert = TRUE) %>%
  select(c(trct_des, trct_org,n_com))

OD <- OD %>% filter(trct_des== "42101980000" | trct_des == "42101009000" 
                    | trct_des == "42101036900" | trct_des == "42101008801" 
                    | trct_des == "42101008802")
  

dat_red<- OD[OD$trct_org %in% tract$GEOID & OD$trct_des %in% tract$GEOID, ]

dat_dsire<- od2line(flow = dat_red, zones = tract)
map1 <- leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(data = dat_dsire, color = "red",weight = 0.3, fillOpacity = 0.5)
map1
desire<- st_as_sf(dat_dsire)

map2 <- leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(data = desire, color = "green",weight = 0.3, fillOpacity = 0.5)
map2
desire$dist_units<- st_length(desire)
desire$dist<- as.numeric(st_length(desire)) 

desire_proj<- st_transform(desire, crs = 2272) 

desire_proj$dist_ft<- st_length(desire_proj) 

D. Section 2: Descriptive Statistics


I. Talking about the variables

In this section, we looked at different variables we would like to consider in this analysis. We picked variables that we thought would have an effect on O-D travel patterns throughout the city. Each set of variables is divided into 3 categories: binary variables, continuous variables, and categorical variables. Each category of variables will have a table of summary statictics provided in this section.

# Binary Variables

desire$iphone<-sample(c("Yes", "No"), nrow(desire), replace = TRUE)

desire$vehicle<-sample(c("Yes", "No"), nrow(desire), replace = TRUE)

# Continuous Variable s

desire$n_com<- as.numeric(desire$n_com) 

x <- 30000:80000
desire$medinc<-sample(x, nrow(desire), replace = TRUE)

y <- 10:70
desire$pctwork<-sample(y, nrow(desire), replace = TRUE)

# Categorical Variables

desire$edu<-sample(c("High school", "Bachelors", "Masters", "PhD"), nrow(desire), replace = TRUE)


desire$transit<-sample(c("Walk", "Bike", "Car-Pool", "Private Vehicle", "Public Transit"), nrow(desire), replace = TRUE)
table(desire$iphone)

The binary variables we have chosen are whether the commuter uses an iPhone or not and whether the commuter uses public transit to get to the University City destination. Binary variables are characterized by a (1,0) value, or in this case- a (Yes, No) value.

st(desire, vars = c('iphone','vehicle'),title = 'Binary Variables')
Binary Variables
Variable N Percent
iphone 38168
… No 19032 49.9%
… Yes 19136 50.1%
vehicle 38168
… No 19121 50.1%
… Yes 19047 49.9%

We decided to look at 3 continuous variables for this section- which included median household income, number of commuters and percent of people employed in the census tract that we are looking at. These continuous variables focus primarily on work and employment, as a large number of commuters could be travelling into University City for work purposes. These variables could be beneficial in looking at how indicators like income and employment affect O-D demand.

st(desire, vars = c('medinc','n_com','pctwork'), title='Continuous Variables')
Continuous Variables
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
medinc 38168 54905.33 14413.247 30008 42383.75 67357 80000
n_com 38168 1 0 1 1 1 1
pctwork 38168 39.975 17.629 10 25 55 70

The two categorical variables we have chose are the level of educational attainment and the mode of commute. These variables are more individual specific and could be useful in discerning patterns of travel to and from the area of study.

st(desire, vars = c('edu','transit'), title='Categorical Variables')
Categorical Variables
Variable N Percent
edu 38168
… Bachelors 9557 25%
… High school 9613 25.2%
… Masters 9492 24.9%
… PhD 9506 24.9%
transit 38168
… Bike 7611 19.9%
… Car-Pool 7664 20.1%
… Private Vehicle 7524 19.7%
… Public Transit 7664 20.1%
… Walk 7705 20.2%

E. Section 3: Continuous Metrics


I. Mapping the Continuous Variables

In the maps below we can see that the desire lines for median income, number of commuters and percentage of people employed seem to be fairly evenly distributed. We cannot be sure about the clustering of high and low values, because the variable values were assigned to each census tract at random, which does take away a little bit of nuance. There does seem to a higher number of O-D trips coming from the north-west region and some trips from South Philly. There is a distinctive lack of travel from the norther portion of our study area.

bins1 <- c(0, 30000, 40000, 70000, 80000)
pal1 <- colorBin("viridis", domain = desire$medinc, bins = bins1)
map3 <- leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(data = desire, color =~pal1(medinc) ,weight = 0.4)%>% addLegend(pal = pal1, values = desire$medinc, opacity = 0.7, title = "Median Income",
  position = "bottomright")
      
map3

In order to plot the map above for median income distribution we created bins of income below $30000, $30000 to $40000, $40000 to $70000 and greater than $70000. Since the sampling is random rather than a collected data set therefore the plots do not show the actual picture of the travelling patters. But this plot is a demonstration of how travel patters can be observed in terms of median income and help us get a better understanding of the relationship of median income to travel behavior for different parts of the geography. For instance here in the plots we see that the share of people with median income between $40000 to $70000 contribute to large number of trips. Also, the more concentration of these trips are coming from west and north-west region of the study area.

bins2 <- c(0, 20, 45, 70)
pal2 <- colorBin("magma", domain = desire$pctwork, bins = bins2)
map4 <- leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(data = desire, color =~pal2(pctwork) ,weight = 0.4)%>% addLegend(pal = pal2, values = desire$pctwork, opacity = 0.7, title = "Percent Working Pop",
  position = "bottomright")
      
map4

For the percentage of working population and to understand their travel behavior we see that working population percentage between 20 to 45 have the highest share of the trips. Additionally, working population of less than 20 percent contribute the lowest. This makes sense, however this is a randomly sampled data.


F. Section 4: Categorical Metrics


I. Mapping the Categorical Variables

In the maps below we can see that the desire lines for categorical variables like level of education and mode of commute, that were previously defined in Section 2. Despite lacking the statistical power that continuous variables have, these categorical variables are more tailored to the population that we are studying. Categorical data also gives us the ability to quickly recognize trends, changes and patterns based on inter-related variables, such as the ones we have used in this section. Some other categorical variables we would like to consider in a future study would be gender, age and race.

pal3 <- colorFactor(palette = c('red', 'yellow','black','blue'), domain = desire$edu)
map5 <- leaflet(desire) %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(color = ~pal3(edu) ,weight = 0.4)%>% addLegend(pal = pal3, values = ~edu, opacity = 0.7, title = "Education background",
  position = "bottomright")
      
map5
pal4 <- colorFactor(palette = c('red', 'yellow','black','blue', 'green'), domain = desire$transit)
map6 <- leaflet(desire) %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(color = ~pal4(transit) ,weight = 0.4)%>% addLegend(pal = pal4, values = ~transit, opacity = 0.7, title = "Transit Choice",
  position = "bottomright")
      
map6

G. Section 5: Binary Metrics


I. Mapping the Binary Variables

In the maps below we can see that the desire lines for binary variables like whether the commuter uses an iPhone and whether the commuter takes public transit. The advantages of using binary variables to inform the analysis in this section is that it is easier to group users based on defining characteristics that could potentially further our study. The disadvantage is that a lot of nuance is lost if we decide to classify all our numeric variables into binaries. It works well in this case because we are asking simple ‘yes’ or ‘no’ questions.

pal5 <- colorFactor(palette = c('red', 'black'), domain = desire$iphone)
map7 <- leaflet(desire) %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(color = ~pal5(iphone) ,weight = 0.4)%>% addLegend(pal = pal5, values = ~iphone, opacity = 0.7, title = "Iphone ownership",
  position = "bottomright")
      
map7
pal6 <- colorFactor(palette = c('blue', 'green'), domain = desire$vehicle)
map8 <- leaflet(desire) %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
  addPolylines(color = ~pal6(vehicle) ,weight = 0.4)%>% addLegend(pal = pal6, values = ~vehicle, opacity = 0.7, title = "Vehicle ownership",
  position = "bottomright")
      
map8

H. Section 6: Preliminary Conclusions


  • From our findings we noticed that places like University City, where major anchor institutions like universities and hospitals are located- are major employment hubs in a city. It is only natural that a large number of trips seem to cluster in this area.

  • we believe that to further understand the travel patters and its distribution with different demographic layers overlapped could help provided better insights to the study. And therefore, we tried combining the data sets we think have significant correlations with travel behavior.

  • From our findings in this initial study, we can see that variables related to income, employment and mode of commute play an important role in defining O-D patterns in a city.

  • In conclusion we think that travel patterns play a significant role in understanding an area’s attractions and necessities inviting people from different surrounding regions.

  • Final project that we are working on is about passive data in transportation and we believe that this assignment helped us think about different variables that might affect the travel behavior and get more informed insights for planning process. And we believe that passive data techniques can help towards working in recording other different attributes which have not been previously collected on a more granular level, helping to study travel patterns better.